Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

POEM: 1-Bit Point-Wise Operations Based on E-M for Point Cloud Processing

161

FIGURE 6.5

Illustration of training w^j

i ^{via Expectation-Maximization. We set a free constraint for the}

weights obeying one speciﬁc distribution, i.e., which is lower than the minimum mean value

or higher than the maximum mean value. For the ones in the middle area (distribution not

transparent), we apply EM(·) to constrain it to converge to a speciﬁc distribution.

6.3.4

Optimization for POEM

In our POEM, what needs to be learned and updated are unbinarized weights wi, scale

factor αi and other parameters pi. These three kinds of ﬁlters are jointly learned. In each

Bi-FC layer, POEM sequentially updates unbinarized weights wi and scale factor αi. For

other layers, we directly update the parameters pi through backpropagation.

Updating wi via Expectation-Maximization: Given a conventional binarization frame-

work, it learns weights wi based on Eq. 6.44. δwi corresponding to wi is deﬁned as

δwi = ^∂L^S

∂wi

+ λ^∂L^R

∂wi

(6.45)

wi ←wi −ηδwi,

(6.46)

where LS and LR are loss functions, and η is the learning rate. ^∂L^S

∂wi ^{can be computed by}

backpropagation, and, furthermore, we have

∂LR

∂wi

= (wi −αi ◦b^wⁱ) ◦αi.

(6.47)

However, this backpropagation process without the necessary constraint will result in

a Gaussian distribution of wi, which degrades the robustness of Bi-FCs as revealed in Eq.

6.80. Our POEM takes another learning objective as

arg min

b^wⁱ−b^wⁱ⁺^γ.

(6.48)

To learn Bi-FCs capable of overcoming this obstacle, we introduce the EM algorithm in

the update of wi. First, we assume that the ideal distribution of wi should be bimodal.

Assumption 6.3.1. For every unbinarized weight of the i-th 1-bit layer, i.e., ∀w^j

i ^∈^wⁱ^{, it}

can be constrained to follow a Gaussian Mixture Model (GMM).